Rotational Properties of Vocal Tract Length Difference in Cepstral Space
نویسندگان
چکیده
In this paper, we prove that the direction of cepstrum vectors strongly depends on vocal tract length and that this dependency is represented as rotation in a cepstrum space. In speech recognition studies, vocal tract length normalization (VTLN) techniques are widely used to cancel ageand gender-difference. In VTLN, a frequency warping is often carried out and it can be modeled as a linear transform in a cepstrum space; ĉ=Ac. In this study, the geometric properties of this transformation matrix A are made clear using n dimensional geometry and it is shown that the matrix can be approximated as rotation matrix. Further, for better approximation, a new method is proposed. Namely, using eigenvalues of A, its quasi-rotational distortion is factorized into multiple true rotation operations and multiple magnification operations. This decomposition resolves the intrinsic ambiguity of the rotation angle based on the inner product, and it describes the detailed geometrical properties of the transformation caused by vocal tract length normalization. Experimental results using real and resynthesized speech samples demonstrate that the difference of cepstrum vectors extracted from different speakers is represented as rotation and magnification, and that the decomposition based on eigenvalues can capture it precisely.
منابع مشابه
Feature space normalization in adverse acoustic conditions
We study the effect of different feature space normalization techniques in adverse acoustic conditions. Recognition tests are reported for cepstral mean and variance normalization, histogram normalization, feature space rotation, and vocal tract length normalization on a German isolated word recognition task with large acoustic mismatch. The training data was recorded in clean office environmen...
متن کاملIntegrating Complementary Features from Vocal Source and Vocal Tract for Speaker Identification
This paper describes a speaker identification system that uses complementary acoustic features derived from the vocal source excitation and the vocal tract system. Conventional speaker recognition systems typically adopt the cepstral coefficients, e.g., Mel-frequency cepstral coefficients (MFCC) and linear predictive cepstral coefficients (LPCC), as the representative features. The cepstral fea...
متن کاملNew transformations of cepstral parameters for automatic vocal tract length normalization in speech recognition
This paper proposes a method to transform acoustic models (HMM gaussian mixtures) that have been trained on a certain group of speakers for use on speech from a different group of speakers. Cepstral features are transformed on the basis of assumptions regarding the difference in vocal tract length (VTL) between the groups of speakers (VTL normalisation, VTLN). Firstly, the VTL of these groups h...
متن کاملIterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation
We present a method that determines the optimal configuration of a bilinear vocal tract length normalization function to transform the frequency axis of one voice according to a specific target voice. Given a number of parallel utterances of the involved speakers, the single parameter of this function can be calculated through an iterative procedure by minimizing an objective error measure defi...
متن کاملCepstral and linear prediction techniques for improving intelligibility and audibility of impaired speech
Human speech becomes impaired i.e., unintelligible due to a variety of reasons that can be either neurological or anatomical. The objective of the research was to improve the intelligibility and audibility of the impaired speech that resulted from a disabled human speech mechanism with impairment in the acoustic system-the supra-laryngeal vocal tract. For this purpose three methods are presente...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011